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Abstract 

The Thorp shuffle is defined as follows. Cut the deck into two equal piles. Drop the first card 
from the left pile or the right pile according to the outcome of a fair coin flip; then drop from 
the other pile. Continue this way until both piles are empty. We show that the mixing time for 
the Thorp shuffle with 2 d cards is polynomial in d. 

1 Introduction 

1.1 The Thorp shuffle 

How many shuffles are necessary to mix up a deck of cards? We refer to this as the mixing time (see 
section ITT21 for a precise definition). The mathematics of card shuffling has been studied extensively 
over the past several decades and most of the problems have been solved. Most famously, Bayer 
and Diaconis (in one of the few mathematical results to have made the front page of the New 
York Times) gave very precise bounds for the Gilbert-Shannon-Reeds (riffle) shuffle model. Their 
bounds were correct even up to the constant factors. For almost all natural shuffles matching upper 
and lower bounds are known (often even up to constants). However, one card shuffling problem 
has stood out for its resistance to attack. 

In 1973, Thorp JOJ introduced the following shuffling procedure. Assume that the number of 
cards, n, is even. Cut the deck into two equal piles. Drop the first card from the left pile or the 
right pile according to the outcome of a fair coin flip; then drop from the other pile. Continue this 
way, with independent coin flips deciding whether to drop left-right or right-left each time, 
until both piles are empty. 

The Thorp shuffle, despite its simple description, has been hard to analyze. The problem 
of determining its mixing time is, according to Persi Diaconis the "longest-standing open card 
shuffling problem." It has long been conjectured that the mixing time is 0(log c n) for some constant 
c. However, despite much effort the only known upper bounds are trivial ones of the form 0(n c ) 
that have circulated in the folklore. The main contribution of this paper is to give the first poly 
log upper bound for the mixing time. 

We shall assume that the number of cards is 2 d for a positive integer d. (Thus, our aim is to 
prove that the mixing time is polynomial in d.) In this case the Thorp shuffle has a very appealing 
alternative description. By writing the position of each card, from the bottom card (0) to the top 
card (2 d — 1), in binary, we can view the cards as occupying the vertices of the (i-dimensional unit 
hypercube {0, l} d . The Thorp shuffle proceeds in two stages. In the first stage, an independent 
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coin is flipped for each edge e in direction 1 (i.e., each edge in the cube that connects two vertices 
that differ in only the first coordinate). If the coin lands heads, the cards at the endpoints of e are 
interchanged; otherwise the cards remain in place. In the second stage, a "cyclic left bit shift" is 
performed for each card, where the card in position {x%, . . . , Xd) is moved to (x2, ■ ■ ■ , Xd, xi). 

We will actually use a slightly modified definition of the Thorp shuffle. Say that an edge in the 
hypercube rings if its endpoints are switched with probability ^. For j = 1, . . . , d, let Kj be the 
transition kernel for the the process in which every edge e in direction j rings. 

Definition: Thorp shuffle. The Thorp shuffle is the Markov chain whose transition kernel at 
time n is Kj+i if j = n mod d. 

Since d iterations of this shuffle is equivalent to d iterations of the shuffle described in JO]; it is 
enough to prove a poly(d) mixing time bound for this new model. 

It is natural to consider the change in the deck after d shuffles have been performed. (This rep- 
resents one complete "cycle".) We will call this a round. Using the language of network computing, 
a round of the Thorp shuffle is like passing the cards through d levels of a butterfly network (see, 
e.g., Knuth's book 0), where at each stage neighboring cards are interchanged with probability ^. 
We note that in a recent breakthrough result, Cam ;2_ showed that the matrix K\ ■ ■ ■ KdK\ ■ ■ ■ Kd-i 
has strictly positive entries. This can be viewed as a result about the "diameter" of the Thorp 
shuffle; after a small number of steps there is a positive probability of being in any given state. 
However, these probabilities are in general very small so this does not imply a good bound for the 
mixing time. 

The main result of this paper is that indeed the mixing time is polynomial in d. Our proof uses 
evolving sets, a technique for bounding mixing times that was introduced by the author and Peres 
in [Hj. Another paper that uses some of the same ideas is 0, in which a variant of evolving sets is 
used to analyze the exclusion process. Evolving sets are related to the notion of strong stationary 
duality due to Diaconis and Fill 

1.2 Statement of main result 

For a Markov chain on state space V with uniform stationary distribution, define the (uniform) 
mixing time by 



where p n (x,y) is the n-step transition probability from x to y. (This is a stricter definition of 
mixing time than the usual one involving total variation distance.) 
Our main result is the following theorem. 

Theorem 1 The mixing time for the Thorp shuffle is 0(d 44 ). 

In similar fashion to the analysis in [Sj, we prove our mixing time bound based on an isoperi- 
metric function we call the root profile. The paper is organized as follows. Following a brief 
introduction to evolving sets in section [21 we devote much of the rest of the paper to proving a 
bound on the root profile. In section we show how I 2 techniques can be combined with evolving 
sets to give a bound on the root profile. In section0]we describe the chameleon process, a variant of 
the Thorp shuffle in which the cards have changing colors, which is useful to bound mixing times. 
In section [S] we use the chameleon process to show that for a "reversibilized" version of the Thorp 
shuffle, any collection of cards (if viewed as indistinguishable) mixes in poly(d) time. In sectional 
we state the main technical result of this paper (proved in section which says that the transition 
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kernel for the Thorp shuffle contracts functions in a certain I 2 sense; then we use this to obtain our 
bound on the root profile. Next, armed with a good bound on the root profile we prove Theorem 
n in section [7J We conclude with proofs of some technical lemmas in sections and |S] and El 



2 Evolving sets 

We will now give a brief overview of evolving sets (see jS] for a more detailed account). Let 
{p(x, y)} be transition probabilities for an irreducible, aperiodic Markov chain on a finite state space 
V. Assume that the chain has a uniform stationary distribution (which means that p is doubly 
stochastic: J2 x ev Pi x iV) = 1 f° r an V e V). For subsets S CV, define p(S, y) := J2xesP( x ^y)- 
Definition: Evolving sets. The evolving set process is the Markov chain {S n } on subsets of V 
with the following transition rule. If the current state S n is S C V, choose U uniformly from [0, 1] 
and let the next state S n +i be 

S = {y:p(S,y)>U}. 

Write Pg^ := p( ■ So = S^j and similarly for Eg^ • ^. Evolving sets have the following 
properties (see [S]). 

1. The sequence {(S'nlln^o forms a martingale. 

2. For all n > and x, y £ V we have 

p n {x,y) =P {x} {yeS n ). 

3. The sequence of complements {S^} n >o is also an evolving set process, with the same transition 
probabilities. 

As in jSj, we will prove our mixing time bound using an isoperimetric quantity that we denote 
by if), which is defined as follows. For S C V, define 



^(S) := I - Ec 

Define ip(x) for x G [0, 1/2] by 



i;(x)=mf{i>(S):\S\<x\V\}, (1) 

and for x > 1/2, let i/j(x) := tp* = ip(^). Observe that ip is non-negative and (weakly) decreasing 
on [0, oo). We will call the function ip the root profile. 

3 From i 2 bounds to a bound on ip 

In this section, we show how to use I 2 techniques to obtain a bound on the root profile. 

Let p(x, y) be a doubly stochastic Markov chain on the state space V. For functions / : 
V -> [0,1], define \\f\U := ^ E, e v fix) and ||/|| 2 := E xe v /(*) 2 ) 1/2 - For S C V, define 
l s : V - [0, 1] by 

1 ( x \ = { 1 if ^ G 5"; 

I otherwise. 
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Lemma 2 Let S be the next step in the evolving set process starting from S, i.e., S = {y : p(S,y) > 
U}, where U is uniform. Let a = ^5' rr^ 2 ■ Then 



lllsl 



E 



\S\ 



< 



a(2 — a) 



Proof: Let A be an independent copy of S, i.e., A = {y : p(S,y) > U'}, for an independent 
uniform random variable V . Note that either S C A or A C S (depending on which of the uniform 
variables U, U' is larger). Let X = \S n A| and Y = \SU A|. Then 



E(VIS|) 



= E(V|5||A|) 
= E(v / A 7 F) 
< JE(X)E(Y) 



E(X)(2\S\-B(X)), 



(2) 
(3) 
(4) 

(5) 



where the first inequality is Cauchy Schwarz and the second inequality follows from the fact that 
E(X + Y) = 2E(5) = 2|5|. But 



E(X) = ^P(yGSnA) 

y&V 



\l 



(6) 
(7) 
(8) 



so dividing the LHS of © and the RHS of © by |S| = |V| • ||ls||i and then taking a square root 
yields the lemma. □ 

Remark: The same proof shows that if S = {y : f(y) > U} for / : V — > [0, 1] arbitrary, then 

a(2 — a) 



E 



< 



1 1 if* f II 2 

where a = \mn f° r K the transition kernel. Note also that if we define A := 1 — a, then 



i - l A 2 

a(2-a) 4 = (1- A 2 )I < 1- — 
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4 Chameleon process 



It will be convenient to study the card shuffle that behaves like the Thorp shuffle for the first d 
steps {K\, . . . , Kd), and then like a "reverse Thorp shuffle" for the next d steps (Kd, ■ ■ ■ , K±). We 
will call this the zigzag shuffle. Every 2d steps of the zigzag shuffle will be called a round. (So a 
round of the zigzag shuffle is a round of the Thorp shuffle followed by a round of a time-reversed 
Thorp shuffle.) 

Let a be large enough so that 4a~ d < 2~ d ~ 1 4~ d for all d > 1 and let c be an integer large 
enough so that [4e~ c ] d f3 log a cd 5 < oT d for all d > 1, where (5 = 2056 -64-5. 

The chameleon process is an extension of the zigzag shuffle. The cards move in the same way 
as in the zigzag shuffle, but they also have colors, which can be red, white, black or pink. Initially, 
the cards are colored as follows. There is a sequence of cards x±, . . . , Xb for some b > 2 d ~ 1 such that 
cards x\,... ,Xb-\ are colored white, card Xb is colored red, and the remaining cards are colored 
black. The cards can change color in two ways. The first way is called pinkening, which takes place 
when an edge connecting a red card to a white card rings; in this case both cards are re-colored 
pink. The second way is called de-pinking, which takes place at the end of every 64cd rounds of 
shuffling; in this case all of the pink cards are collectively re-colored red or white, with probaility 
^ each. (A process of this type was first used in [Jj to analyze the exclusion process.) Note that 
black cards can never change color. 

Let X n be the zigzag shuffle. For j = 1, . . . , 2 d , we will write X n (j) for the position of card 
j at time n. If S = {z\, . . . , Zk} is a set of cards, define X n (S) = {X n {z\), . . . , X n (zk)}. Let 
W n = X n (^{l, . . . , b}^j be the unordered set of locations of nonblack (i.e., white, red or pink) cards 
at time n. For vertices x in the hypercube, define 

Pn( x ) = 1 (there is a red card at x at time n) + ^1 (there is a pink card at x at time n). 

The following lemma indicates the fundamental relationship between the chameleon process and 
the zigzag shuffle. 

Lemma 3 Consider the chameleon process with b nonblack cards. Then 



p(x n (x b ) = x\Wi,W 2 ,...) = v(p n {x) 



w u w 2 ,...). 



Proof: We will use induction on n. The base case n = is trivial because there is initially only one 
red ball which is located at the position of card x\,. Now assume that the result holds for n. Let e be 
the edge incident to x that rings at time n and let x' be the neighbor of x across e. Let A\, A 2 and A% 
be the events corresponding to the following three possible values of (\V n n {x, x'}, W n+ \ n {x, x'}^j 
when x G W n+ i: 

1. ({x,x'},{x,x'}); 

2. ({*'},{*}); 

3. ({x},{x}). 

Let T n = a(p n (x), p n (x')). Note that 

E(p„ +1 (x) j T ny W l7 W 2 , ...) = (\pn{x) + ip n (xO)l(^i) + p n (x')l(A 2 ) + p n (x)l(A 3 ). (9) 



5 



Define pL n {-) = v(x n {x b ) = ■ | W U W 2 , ...). Then 

fl n+1 (x) = (\nn{x) + \nn{jfjl{Ax) + Hn{x')l{A 2 ) + Hn(x)l(A 3 ). (10) 

But by induction we have 

Hn(x) = v( Pn (x)\w l ,W 2 ,..); n n (x') = E( Pn (x')\w 1 ,W 2 ,...). 

To complete the proof, take the conditional expectation given W\ , W 2 , . . . of both sides of © and 
combine with equation (|1U|) . □ 



Remark: Note that 



X 



W U W 2 ,...) = ^P(x n (6) 



w 1} w 2 ,. 



1. 



(11) 



5 Indistinguishable cards mix in poly time 

Let A be a set of cards. Then the process {X n (A) : n > 0} is a Markov chain. The following lemma 
says that the uniform mixing time for this chain is 0(d 5 ). 

Lemma 4 There is a universal constant b G Z sitc/i i/tcrf if m = bd 5 then 



max 

A, A' 



;g;)P(A- m A')-l|<i (12) 



where we write A— > m A' /or i/ie event that X m (A) = A'. 

Proof: It is enough to consider sets A with |A| > 2 d ~ 1 . (Otherwise, consider A c .) Let a, (3 and c 
be defined as in section 0J let b > (3 log a c, and let m = bd 5 . For j G {1, . . . , 2 rf } define 

= max max ( 2 ^)P(5^ m S") - 1 

\S\=3 \S'\=J 

We will show that for all k > 2 rf_1 , we have 

A(Jfe) < r4" d , (13) 

where k* = 2 d — k. This yields the lemma because the r.h.s. of (|13|) is at most | for all d > 1. 

Let ^4 and -B be disjoint sets of cards. For x £ A, say that x is antisocial in round j of the zigzag 
shuffle if at no point in round j does an edge connecting x to a card in B ring. Let Z(A, B,j) denote 
the number of cards that are antisocial in round j. We say that A avoids B if Z(A,B,j) > -%\A\ 
for QAcd consecutive rounds j before time m. If S is a set of cards, say that S mixes if there do not 
exist disjoint sets A, B of cards with \A\ < ^\S\ and AU B = S such that A avoids B. 

We will verify 1)13(1 by induction on k*. The base case k* = (k = 2 d ) is trivial. Suppose it's 
true for k, where k > 2 d ~ 1 and consider k — 1. Fix a set of cards S = {x\, . . . ,xjS\ and consider 
the corresponding chameleon process. Let T = a(X n (S) : n > 0). Let Z n = X)x /°64cd 2 n( ;c ) be the 
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total amount of "red paint" in the system after Q4cdn rounds of the chameleon process. Define 
Z\ = mm(Z n , k — Z n ). Note that hin^^oo Z\ = a.s. 

Fix n such that 64cd 2 n < m, and let A n be either the set of cards that are red or the set of cards 
that are white at the start of round 6Acdn, according to whether Z n < k/2 or Z n > k/2, respectively. 
Let P denote the number of cards pinkened during the next 64cd rounds. Let B n = S — A n . When 
S mixes, A n doesn't avoid B n . We claim that this ensures that P > Consider a round j such 
that Z(A n , B n ,j) > ■g\A n \. Note that after an edge connecting a card x in A n to a card y in B n 
rings, at least one of the resulting cards is pink. Let us associate that pink card with x. (If both 
endpoints are pink then choose one of them arbitrarily.) Since at least a fraction 1/8 of the cards 
in A n will have a pink card associated to them in this round, and since any given pink card can 
be associated to at most d cards in A n in this round, the number of pink cards at the end of this 



round must be at least 



\An\ 



It follows that P> 4^- 



Note that Z n+ i is either Z n + |P or Z n — with probability | each. Thus, if we write E for 
the event that S does not mix, then 



B(Jzl^\p,Z n ,T,E c ) = E(y(Z n + \Pf + y(Z n - \Pf I Z n ,F,E 



< 



1 + 



16d 



+ \1 



1 
16d 



< V Zi exp 



2056d 2 



(14) 

(15) 
(16) 



where the first inequality follows from the concavity of the square root, and the second inequality 
follows from the fact that \\f\ + u + — u < exp(— u 2 /8) whenever u G [0, 1] (see (Sj, Lemma 
9). 

Thus, since Zq = 1, it follows that 



El 



{\fz~t J 7 , S mixes^j < exp 



n 



2056d 2 



for all n. Define Z^ = lim^^oo Z n . (Note that for any S' we have E(Zoo | S— > m S') = 
remark immediately following Lemma EJ) Lemma implies that for all y € S' we have 



S >rn S 



< E 



(\pm{y) - \ 



S > m S J 



< P(p m ${0,k}\S^ m S'). 



Let E be the event that S does not mix. LemmaElin Appendix A gives P(E \ S— > m S') < 
Hence 



(17) 
1; see the 

(18) 

(19) 
(20) 



d l+A(fc) 
a 1— X(jfe) • 



P( P m^{0,k}\S^ m S') < P(E\S^ m S') +P( Pm i {0,k} 



< a 



< 3a- d + P(p m ^{0,A;} S^ m S',E c ), 



(21) 
(22) 
(23) 
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where the third inequality holds because X(k) < \ by induction. But 
p(p m ^{0,k}\s^ m S',E c ) < B(zl Mcd2 



< exp 



rn 



2056 • 64 • cd A 



< a 



(24) 
(25) 



where the second inequality follows from equation (|17f) . Combining equations (|2*U)l . (|2*3*)) . and 1)25(1 

gives 



P(X m (x fc ) 



S > m S 



< 4a" 



(26) 



Now fix a set of cards A with | A| = k — 1 and and let z ^ A. Define A 2 = AUz. Fix a set A' of 
vertices of the hypercube with |A'| = k — 1. For w ^ A', define 



p(a 



^ A' 



Z^mW 



A — ► A' 



Note that 



: u; ^ A'} 



+ i and £±1^ 



Ax w — ( ) 

Ay™ = Uw - l/k. 

-l 



(27) 
(28) 



/ 2" 

Kk-i) 



It follows that 



P(A- m A') - 



|£p(a 

w&A' 



1 /2 



(*) 



1 /2 d 



-1 



| Ax w ± + Ay w (X) + Ax w Ay v 

w£A' 



Note that 



ax w | < r4- d ( 2 ;) - < Q 



-1 



-1 



(29) 
(30) 
(31) 

(32) 



where the first inequality is induction and the second inequality holds because k* < 2 d . Also, 
equation (|26|) implies that 

\Ay w \ < Aa~ d < ±4~ d , (33) 

for all d > 1 by the definition of a. Thus, using equations (|3*2*)) . (|33j) and the triangle inequality, 
equation (|3*Tj) becomes 



P(A- m A')-(£)' 



< 



k* + l 



kH -df2 



+ 4 



-d/2 d 



1 



(k* + 1) 4 



2 / ,-d/'2' 



= (^+i)4- d ( fc 2 _ d 1 )- 1 = (fc-ir4^( fe 2 _ d 1 )- 1 . 

Since this is true for all A with |A| = k — 1 the proof is complete. 



S 



Let K be the transition kernel for one round of the Thorp shuffle, and let K be the transpose 
of K, defined by K t (x,y) = K(y,x). Note that K is the time-reversal of K. Let K := KK l 
be the transition kernel for one round of the zigzag shuffle. Let {Z n : n > 0} be a Markov chain 
with transition kernel K. Then Lemma El implies that for any set of cards B, the uniform mixing 
time for the process {Z n (B) : n > 0} is at most bd 4 . Thus, using standard facts about geometric 
convergence and the uniform mixing time, we can conclude that for a universal constant C we have 



for all k > 1. 

Truncated Thorp shuffle. Fix < d. Define the d*-truncated Thorp shuffle as the Markov 
chain with transition kernel K* = K\ . . . . This is a "partial round" of the Thorp shuffle, with 
steps d* + 1 through d censored. To make things irreducible, we define the state space as the set of 
states reachable from an (arbitrary) fixed starting state. 

Define the d^-truncated zigzag shuffle as the Markov chain with transition kernel K+K+. Note 
that we can think of this shuffle as a product of 2 d ~ d * copies of a "d*-dimensional" zigzag shuf- 
fle, where the cards occupy 2 d ~ d * (disconnected) hypercubes of dimension d*. Combining this 
observation with equation (|34|) yields the following corollary to Lemma El 

Corollary 5 Fix > 2 and let {Z n : n > 0} be the d+-truncated zigzag shuffle. There is a 
universal constant c such that if I = kcd(d+ — 1) 4 7 then 




(34) 




(35) 



for all k > 1. 



Proof: Let c = 2 5 C. Then / > 2kdCd A , 



so equation 1)34(1 implies that 



max 
B> 



(,S,)P(Z I (B) = B / ) < (l + e-^f 



< exp(2 d exp(-2(f£;)) 

< exp(exp(— k)), 



for all d > 1. 



□ 



6 A bound on the root profile 



We will need the following technical result, which is proved in Appendix B. 



Corollary 1131 Fix S C V and let x = tJ^Ij = ||lg||i. Let p(- , ■) be the transition kernel for 
one round of the Thorp shuffle. Then there is a universal constant C > such that 



\\ P (S,.)\\l<x^ d 



We are now ready to obtain a bound on the root profile of the Thorp shuffle. 
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Lemma 6 Let tp be the root profile of the Markov chain which each step performs a round of the 
Thorp shuffle (K1K2 ■ ■ ■ K^). There is a universal constant c > such that 

1>{x) > max(l - x c/2di2 , ccT 28 ) . (36) 

Proof: Let C be the constant appearing in Corollary 1131 We will show that there is a universal 
constant B > such that 

1>[x) > max( 1 - x CB ' 2di2 , Bd~ 28 ) . (37) 



Setting c = min(BC, C) will then yield the lemma. First, we show that V* > Bd 28 . Fix S with 
■ML = x < \ and let 

S = {y:p(S,y)>U}, 

where {p(x, y)} are the transition probabilities for one round of the Thorp shuffle. The remark 
following Lemma |21 implies that 



E, 



]5«| A» 
\ \S*\ 4 ' 



where A = 1 - llp ^'\F 2 , and Corollary QJ implies that \\p(S, • )]|| < x c / dl4 ||l s ||i < 2~ c / dli \ \l s \ \ x . 



.is 1 
Thus 



A > 1 - 2~ Cd 14 (38) 

= 1 _ e -Clog2d-" (39) 

> Ad~ u , (40) 

for a universal constant A > 0, and hence 1 — 4h < 1 — Bd~ 28 for a universal constant S £ (0, |). 
(The fact that we can take B < 7 will be used later on.) Since this holds for all S with < ^(2 d )\, 
we conclude that ip* > Bd~ 28 . To complete the proof of Lemma El we must show that equation 
(|37jl holds when the max is achieved by the first term. Suppose that 1 — x CB l 2djV2 > Bd~ 28 . Then 

x < (1 - Bd- 28 ) 2di2 / CB < exp(-2C" 1 d 14 ). (41) 

Assume that (|41jl holds. Lemma El gives 



E, 

IIp(S,0II? 



1 5" I 1 1 

*,Lj f |<(a(2-a))3<(2a)3, (42) 



where a = i i i^n, • Equation (|41[1 implies that 

<e- A < 2 , 
and hence 

2 < x -C/™ x \ (43) 
Furthermore, Corollary 1131 implies that a < x*^ 14 . Plugging this and (|43|) into (|42() gives 



E, 

since < \ (and x < 1). 



J|J[ < {x -C/2d^ x C/d^ = x C/8#* < a .CB/2d« ) (44) 
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7 Proof of main result 



Proof of Theorem ^ We shall start by bounding the mixing time of the Markov chain that 
does an entire round of the Thorp shuffle each step. Recall that the root profile ip : [0, oo) — > R is 
defined by 

/ inf{V(S) : \S\<x\V\} iixe [0,i]; 
\ tp* if x > 



ip{x) 



where ip* = ^(|)- Thus ip is (weakly) decreasing on [0, oo). 

Let h(z) := 1 — tp(l/z 2 ). Since ip(x) = ip* for all real numbers x > i, the function h is well- 
defined even for z < 1. Note that /i is nonincreasing. In |H] it is shown (see section 5 and the part 
of section 3 entitled "Derivation of Theorem 1 from Lemma 3 and Theorem 4") that there is a 
sequence of random variables {Z n : n > 0} that satisfies Zq = \/\V\ and 



E 

such that 



Z 



n+1 



Z n 



Z^j < h(Z n ), (45) 



r mix <2min{n:E(Z n )<i}. (46) 
Lemma gave the following bound on the root profile: 

iP(x) > max(l - x c/2d4 \cd- 2S ), (47) 

for a universal constant c > 0. Thus h < g, where g is defined by 

g{z) =min(V c / d42 ,l-«r 28 ' 



and hence E(Z n+ i\Z n ) < g(Z n )Z n . Let f(z) = zg{z) = min^ 1 ^ 42 , z{\ - ccT 28 )). Note that / 
is increasing and, as the minimum of two concave functions, is concave. We claim that E(Z„) < 
/™(Zo), where f n is the n-fold iterate of /. We verify this by induction. The base case n = is 
immediate. Suppose that the claim holds for n. Then 

E(Z n+ i) = E(E(Z n+1 \Z n )) (48) 

< E(/(Z n )) (49) 

< /(E(Z n )) (50) 

< /(r(Z )) = r +1 (Z ), (51) 

where the third line follows from concavity and the last line is the induction hypothesis. Let 

/i = ^- c/d42 ; h = z{i-cd- 2 \ 

so that / = min(/i, f2). Then for all m, n we have 

E(Z m+n ) < f m+n (^Zo) < fT(fi(Zo))- 

But ft(z) = z^- c / d42 y i < 2 ex P (-cn/d 42 ) ) and Zq = < ^dy d = 2 ^ d _ xhuS) choosing n > c^d 43 

gives 
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which is at most 4 for all d > 1. Finally, since 

f?(z) = z(l-cd- 28 ) m <ze- cm l d2 \ 

we have /2*(4) < 4e _cm / rf28 , which is at most \ whenever m > c~ 1 d 2S log 8. Putting this together, 
we conclude that r m ; x < 2c~ 1 (<i 43 + d 28 log8) = 0(d 43 ). Since each round corresponds to d Thorp 
shuffles we conclude that the mixing time for the original model is 0(d 44 ). □ 



8 Appendix A 

In this section we prove some large deviation results needed in section |SJ We will adopt the notation 
of that section; for the convenience of the reader, we now give a brief recap. Let A and B be disjoint 
sets of cards. For x E A, say that x is antisocial in round j of the zigzag shuffle if at no point in 
round j does an edge connecting x to a card in B ring. Let Z(A, B,j) denote the number of cards 
that are antisocial in round j. We say that A avoids B if Z(A,B,j) > ^\A\ for 64cd consecutive 
rounds j before time to. If S is a set of cards, say that S mixes if there do not exist disjoint sets 
A, B of cards with \A\ < h\S\ and A U B = S such that A avoids B. 

Lemma 7 Let {X n : n > 0} be the zigzag shuffle. Let Z = Z(A,B,1) be the number of cards 
that are antisocial in the first round. Define J- B = o~(X\{B\ . . . ,Xd(B)). Let p 
k = \A\. For > define <$> p {9) = l- p + pe 9 . Then for all 9 > we have 



1 — 2^ an d let 



E 



(52) 



6 k 



Proof: We verify this by induction on d. If d = 1 then the LHS of (|52[) is 1 if p < 1, and e 
otherwise, so (|52j) holds. Now suppose that d > 1. Let ^4' be the set of cards in A not adjacent to 
B in direction 1, and let k' = \ A'\. Let / be half the number of cards in A' adjacent to another card 
in A' in direction 1. (Note that I is an integer.) Let k$ and k\ be the number of cards in A' that 
end up with a leading and 1, respectively, after the first step of the round (i.e., after the edges 
in direction 1 ring). Of those in the first group, let Zq be the number that are antisocial, with a 
similar definition for Z\. Note that given Tbi the random variables ko and k\ are both distributed 
like W + I, where W ~ Binomial(£/ — 21, ^), and note that Z = Zq + Z\. By induction, we have 



E(e 9Z |^,X 1 (A)) = B{e eZa \^ B ,X 1 (A))B( K e ez '\^ B ,X 1 (A) 



< 



<S> P0 (6) k °<S> Pl (9) kl 



where po is the fraction of locations of the part of the hypercube with a leading not occupied by a 
card in B after the first step, with a similar definition for p\. It follows that 'E^e ez T B , k$, k\j < 

$p {8) ko< S>Pi(8) kl - Hence 



E(e 



oz 



k'-2l 
i=0 



%J9)<f> 



pi 



(9)^(9)^(9) 



r -\k'~2l 
[l$ po (0) + i$ pi (0)] ^(0)^(0) 

^ P0 (e) + ^ pi (9)] k ' = ^ P (of, 



< 
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where the last inequality follows from the AM-GM inequality and the final equality holds because 
p = \{po +Pi). This yields the lemma because k! < k. □ 

Lemma |7| easily gives the following large deviation inequality. 
Corollary 8 Suppose that p < 3/4. Then 

*>(z>lk\f B ) <e- fc / 64 . 

Proof: We have 

B{e e( z ~ p V\T B ) = e- pke B(e ez \F B ) 



< 



(1 - p)e~ pe + pe e 



(i-p) 



(53) 
(54) 



by Lemma |7| The quantity inside the square brackets is Efe^C p ^j, for a Bernoulli(p) random 
variable Y. The inequality E(e w ) < e var(w/) , valid when E(W) = and W < 1 (see, e.g., 0), 
implies that the quantity (JHIj) is at most exp^0 2 A;^ if < 1. Letting = \ gives 

E(exp[i(Z-pfc)]) <e fc / 64 , 

and hence 

= Pfexp[|(Z — pk)\ > exp 



(55) 



Tk _ Jik 
32 4 



7k , pk 
32 ~r 4 



exp 



< exp 

by Markov's inequality. Finally, since p < 3/4, the quantity (|57[) is at most e _fc//64 . 
The following lemma was used in the proof of Lemma 0] in section [SJ 



(56) 
(57) 



Lemma 9 Fix a set of cards S with \S\ > 2 d 1 . Then for any set S' of vertices of the hypercube 
we have 

p(5 does not mix S^ m S') < a~ d 1 + X ^ 



i-\(\s\y 



Proof: Let E be the event that S does not mix. We have 



p(-E,S-> TO S') < 12 P ( A avoids B,S^ m S' 



fc <l| S | A:\A\=k 



< 2 d-1 max 



2 dk max P(A avoids B,S^ m S' 
A:\A\=k V 



where in the summations we write B for S — A, the 2 is an upper bound on the number of 
k < t%\S\, and the 2 dk is an upper bound on the number of sets A with \A\ = k. Since \A\ < ^\S\ 
and All B = S,we must have > \2 d . Hence if |A| = k then 

P(4 avoids B,S^ m S') < p(a avoids B,B^ m B',A^ m A' 

B'CS' 

< J2 P(B^ m B')P (a avoids B 



B'CS' 



B^ m B'^j , 



(58) 
(59) 
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where in the summations, we write A' for S' — B' . But 

m i+64cd— 1 



'(a avoids B I B^ m B') < ^ J] p(z(AB, j) > ™ I B^ m B'^ < m(V fc / 64 ) " , (60) 



64cd 

i=0 j'=i 

where the last inequality follows from Corollary Hence 

P(A avoids B,S^ m S'^ < ^ P(B^ m B')me- ckd 

B'CS' 

< 2 dk me- ckd maxP(B^ m B'), 

B> 

where the 2 dk is an upper bound on the number of subsets B' C S' . But for any B' we have 



P(B- m fl') < £ P(S^ m S) < 2*(*1) '(1 + A(|S|)). 



It follows that 



avoids B,S-> m S') < A dk me- Cdk (^) 1 (1 + X(\S\)) (61) 



[^l^logacrf 5 ^) ~(1 + A(|S|)) (62) 

-l 



< a- d (* sl ) "(1 + A(|5|)), (63) 



where the second inequality follows from the definition of c. Finally, since P(5— > m S") > (i^.) (1 — 
X(\S\)), we get P(A avoids B I S^ m S') < a~ d ^j|| | j - □ 



9 Appendix B 

The purpose of this section is to prove Corollary ^3 which is used to bound the root profile. If K 
is the transition kernel for a Markov chain on the state space V, we will consider K as an operator 
acting on the space of functions / : V — ► R by 

Kf{x) = Y,K{x,y)f{y). (64) 

y&V 

We will need the following lemma, which was proved by Yuval Peres. 

Lemma 10 Let K be a doubly stochastic transition kernel and define K = KK l . For any function 
g : V — * [0, 1] and n > 1 we have 

||^<7lli<(5,5)^"(^ n 9,9)". 
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Proof: Since K is symmetric it is diagonalizable. Thus we can write g = ^ a i<?\ where the g l 
are orthonormal eigenfunctions of K with corresponding eigenvalues Aj. We have 



K'gWi (Kg,g) 



(g,g) (g,g) 



(65) 
(66) 



~ \ > (9,9) > ' { 7) 

by Jensen's inequality. Multiplying both sides by {g, g) yields the lemma. c 

We will also need the following lemma, which was proved by Keith Ball. 

Lemma 11 Let X be a random variable taking values in [0,1] and suppose that E(X) = < \. 
Then for any p > 1 we have 



V(XP) , * „ . X-u 

— - 1 < /i 1 ~ p - I E . (68) 



Proof: Let I = |E(|X — mD- For a given value of I, the l.h.s. of (|68jl is maximized when X is 
concentrated on the three values 0, \i and 1 (because it is a convex function of X). Let po,p^ and p\ 
be the respective probabilities. Then I = pi(l — fJ,) = po^i, and hence p^ = 1 — po — pi = 1 — ■ 
It follows that 



1 1 



AtP(l - /i) /z(l - /x) 



< -(^ P -l), 
A* 

since 1 — \i > |, and the proof is complete. □ 

Fix d± < d Recall that the d*-truncated Thorp shuffle is the Markov chain with transition 
kernel K\ = K\ . . . K^. Let V denote the state space of this chain. Corollary 1131 is a consequence 
of the following technical lemma. 

Lemma 12 Fix f : V — > [0,1]. Then there is a universal constant C E (0, 1) such that 

ll^/lli<ll/lll +1/Cd2di2 - 

Proof: Suppose that d* = 1. Then the truncated Thorp shuffle makes the distribution uniform 
over V in one step. Thus, 

11*2/113 = Ell/Hi^ ( 69 ) 



I (70) 
< WfWl (71) 
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for any p G [1,2], since ||/||i < 1. Suppose now that d± > 2. Let c be the constant appearing in 
Corollary E| We will consider the cases ||/||i < 6~ cd * and ||/||i > 6~ cd * separately. 

Case 1: ||/||i < 6~ cd *. We show by induction on d+ that \\K*f\\l < 1 1/| \\ +1/cdd ^ . The base 
case d* = 1 is handled by equation (|71|) above. Now assume that the result holds for d* — l. Define 
£* as the set of vertices in the cube whose d+ h coordinate is 0. Let B denote the collection of 
subsets b of {1, ... , 2 d } such that X(b) = £* for some X G V (i.e., there is a configuration X G V 
such that the set of cards occupying is b). For b G B, define V& = {X G V : X(b) = £*}. Let 
r = H/lli and for AcB, define 

V A = U beA V b . 

Let 

Since IC&es HTTn'p = Hf^l" = ^' Markov's inequality implies that 

J ||<^- (72) 
Let ^4 = and let f\ = J\a and / 2 = /1a c - Then 

ll^t/lll = ll^l/i + ^/2||l < 2|K/i||l + 2||K*/ 2 |||. (73) 

We will bound each term on the right hand side separately. First, consider H-K^/iHl- Let K 
be the transition kernel for the d*— truncated zigzag shuffle, i.e, K = K\ ■ ■ ■ ■ ■ ■ K\. Let n = 
cd(d+ — l) 4 . Using Corollary [5] (with k = 1) and combining this with equation ((72*)) gives K n (x, Vjf) < 
exp(exp(— l))r 1 / c( * for all x. Hence 

(K n h,l A ) = {Vr^hWK^VH) (74) 

X 

< ||/ 1 || 1 exp(e X p(-l))r 1 / d *. (75) 

Finally, Lemma IT(*fl gives 

mthwi < </i,/i) i - i/n (x»/i,/i> i/n (76) 

< </i,/i) 1 - 1/n (^ n /i,lA> 1/n , (77) 
where the second inequality holds because fi < 1a- Putting this all together, we get 

ll^/illl < (/i,/i) 1 ^ 1/n [||/i||i(exp(exp(-l)))r 1 ^] 1/n (78) 

< 2(%f) 1 - 1/n x||/ 1 || 1 xrV^ I (79) 
v ll/illi ' 

since exp(4 exp(— 1)) < 2 for all n. Since n = cd(d± — l) 4 , and jferr^ < 1, we have 

||^/l|||<2r 1 /^(^-i) 4 ||/|| 1 . (80) 
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Next we bound H-K^^Hl- Since is symmetric it contracts I 2 . Hence 

\\Kih\\l < WK^yK^Wl (81) 
= Y.W K {d,-i)---Kif2lv b \\l (82) 

Note that K\ ■ ■ ■ Kfa—u is just the transition kernel for a (d+ — l)-truncated Thorp shuffle and that 

the Vb are communicating classes for this process. Thus, we can use the induction hypothesis to 

bound each • • • ifij^lyjll, provided that the corresponding normalized l\ norm ^ji^j^ 1 

ll/alv Hi d *~ 1 
is sufficiently small. Define r& := p^~]h" • We claim that for every b E B we have r& < r d * .To 

see this, note that if b € H, then H/2I villi = an d the claim holds trivially, so assume b ^ H. 
Then 



IIMvJIi 



- IIMvJIil^l (83) 
villi 

< H/lViHi|S| (84) 



-1 d*— 1 



< r^||/||i=r— , (85) 

where the first equality holds because ||lyj|i = |$| _1 , the second inequality holds because b ^ H 
(and by the definition of H) and last equality holds because ||/||i = r. It follows that 

r b < < Q-<d*~l)d* B < 6 -cK-i) 6 _ ( 86 ) 

Thus we can apply the induction hypothesis, which gives 

WK^^yK.hly^l < r 6 1/cd( ^- 1)5 ||/ 2 lvil|i (87) 

< r^^-^WhlyJU, (88) 

where the second inequality follows from the first inequality in (|86[) . Combining this with equation 
(jBZj) and using the fact that /2 < / gives 

II^MII^r 1 /^*^-!) 4 !!/!!!. (89) 

We are now ready to bound H-fT*/!!!- Combining equations l|89|). (jBQj) and l|73jl. we get 

IK/HI < (er 1 /* 5 ***^-^*) I l/l |i. (90) 

Since (A; — l) -4 — /c~ 4 > fc~ 5 for integers k >2, the quantity (|9U|) is at most 

Q r l/cddl + l/cdd1nru <r l/cddly 



since r < 6~ cdd * . This concludes the proof in the case r < Q~ cdd +. 

Case 2: r > Q~ cdd * . Let C be an integer that is larger than 2 15 c 2 15 log 2 log 6. We will show by 
induction on that 

\\Kif\\l <r 1+1 / c ^ 2 . 
The base case d* = 1 was handled earlier by equation (|71j). 
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Now, fix d* > 2 and / : V — ► [0, 1] and suppose that r = ||/||i > 6~ *. We can assume 
w.l.o.g. that r < ^. Otherwise, let h = 1 — /, and suppose that the result holds for /i, i.e., for 
q = l/Cd 2 dl 2 we have 



< 



1+9 

1 ! 



or equivalently, 



Note that 



I" 11^112 > 



\Kl 



(Kt(l-f),Kt(l-f)) 
(Ktl,Ktl) - 2(Kll,K t J) + (Kif,Ktf) 
l 

i + 11*2/11!, 



where the third equality holds because Kl is doubly stochastic and hence Kll = 1. Thus 



l^ll! = ll/lli-ll*Vlli 



Define u : [0, 1] ->■ R by 



1 - x q )x 



x(l — x) 



(91) 



(92) 
(93) 
(94) 
(95) 



(96) 



(97) 



l + z<H Vx 1 -i' 

so the RHS of (|91j) is ii(||/i||i). Since the numerator on the RHS of (|97j) is symmetric about ^ and 
the denominator is increasing, we have u{x) > u(l — x) if x < ^. This, combined with equation 
()96|) . shows that equation ()91|) is still true if we replace the h by /. Thus we can assume henceforth 
that r < \. 



Let B and V& be as defined above. Then 



l^/lli 



< 



11^(4.-1) •••^l/ll2 

E \\ K (di,-i) ■ ••^iAvJII- 



(98) 
(99) 



6eB 



For b £ B, define r& 



HA 



VJ 1 



|£>|. We may assume that 



\K, 



,2 / „l/Cd 2 (d*-l) 12 | 



< K 



(In the case where r& < 6 cd ( d * 1 ^ this was proved earlier, since cd(d± — l) 5 < Cd 2 (d+ — l) 12 ; in 
the case where r& > 6~ crf ( rf * -1 ) this is the induction hypothesis.) Combining this with (|99f) gives 



l^/ll! <E 



^l/Cd^d*-!) 1 



^l+l/Cd 2 ^*-!) 1 



6eB 



beB 



Thus, unless 



1+1/Cd 2 (d t ~l) 1 



> r 



1+1/Cd 2 d\ 2 



(100) 



6eB 
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the result is immediate. So assume that (|1UU|) holds. For b £ B, define toj, = ~rjjn~ • Note that 
J2beB w b = 1- Let U be chosen uniformly at random from B. Let p = l + l/Cd 2 (d+ — l) 12 . Dividing 
both sides of (|1(JU|) by r p gives 

E ( r t/) > r i/cd 2 d?-i/cd 2 (d*-iy\ ^ 101 j 

Using the inequality k~ 12 — (k — 1)~ 12 < —A; -13 , valid for integers k > 2, and subtracting 1 from 
both sides of (jlUl|) gives 

M) _ ! > r -l/C«P«e» _ i (102) 

Let 7r be the uniform probability measure on B and let v be the measure on B defined by the Wb- 
Define 



7T 



rjj — r 



v \\tv = \ Y\ Wb - \B\ 1 
beB 

Note that E(rj/) = {B^^beB 

Tb = t. Plugging X = rjj and fj, = r into Lemma ITTI and combining 

with equation (jlU2j) gives 

r -l/Cd 2 dl 3 _ i 

2\\ir-v\\ TV > r _ 1/CdHd ^_ 1)12 _ 1 (103) 

\cdd*j _ £ 104 ^ 



exp 



f -logr 

\cd 2 (d*-iy^ 



Since r > 6 cdd *, the quantities in the exponents in (|lU4j) are in (0, |]. (Recall that C is much 
larger than c.) Hence, the fact that ^-p^ E [1)2] whenever i £ (0, ^] implies that the quantity in 
(|TUH) is at least 

(d* - l) 12 d- 1 
2d* 3 " 2 13 ' 

where the inequality holds because d* > 2 and hence > i. It follows that 1 1 vr — ^||tv 
Note that 

2||7T-i/||rv = Xl max ( zy ( 6 )' 7r ( 6 )) - min(i/(6),7r(b)); (105) 
fees 

2 = ^max(i/(6),7r(6))+mia(i/(6),7r(6)). (106) 

Subtracting the first equation from the second and dividing by 2 gives 

1 - || vr - v\\tv = min(i/(6), 7r(6)). (107) 

beB 

Recall that K is the transition kernel for the ^-truncated zigzag shuffle. Note that 

(f,k n f) = £</1h,(*"/)1i*> ( 108 ) 

beB 

< ^min(||/l y J| 1 ,||(^/)lyJ| 1 ) (109) 
beB 

\\(K n f)l x 

.ixLiiywb, 

beB 



^min^, 11 ^" 1 ), (HO) 
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where the inequality holds because fly b — 1 and (K n f)lv b < 1- Let n = lbcdd^ log 2. Corollary [5] 



implies that 



Mi 



< a{d*)\B\ 1 , where a(k) := exp(2 15fc ). Hence, 



(f,K n f) < 



< 



i a(d+) ^2 min(iy 6 , ^ ) 

beB 

i a(d*)(l - ||f - 7t\\tv) 



a{d±) l-2~ u d7 1 



Hence Lemma ITU1 gives 



\Klf\\ 2 2 < (fj) 1 - 1 ^ 



< 
< 



l 

iexp(- 



ia(d t )(l-2- 14 £C 1 ; 
1 x a(d*) 1/n x (1 



1/n 



>-14 j- 



o i/n 



-15d* 



iexp -l/2 i& cA2°151og2 , 



since |rjA < 1 and 2 



(111) 

(112) 
(113) 



(114) 
(115) 

(116) 
(117) 



15fc < 2 15 A; 1 for all positive integers fc. Finally, since r > 6 ccW * = 
exp(-cd^log6), wehaver 1 /^ 2 ^ 2 > exp(-l/2 15 cd^l5 log 2). (Recall that C > 2 15 c 2 15 log 2 log 6.) 



It follows that 1 1 if* /| || < r 



1/Cd 2 dl 2 



l. This completes the proof. 



To bound the root profile, we actually used the following corollary. 
Corollary 13 Fix S C V and let 

\S\ 

x — 



Let {p(x,y)} be the transition probabilities for a round of the Thorp shuffle. Then there is a 
universal constant C > such that 

\\p(S,.)\\l<x^ d14 . 

Proof: Let f = 1$ and d* = d and apply Lemma IT2l (Note that if K is the transition kernel for 
a round of the Thorp shuffle then p(S, ■) = K l f.) 

□ 
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